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Introduction 
Why Should You Care About Al Used for Hiring? 


The health and vibrancy of our national economy depends on the health and vibrancy of our organiza- 
tions. How an organization is staffed is a key driver of its health. Yet only one in three executives rates his 
or her company as “very effective” at reducing unsuccessful hiring decisions (Bravery et al., 2019). Thus, 
industrial-organizational (I-O) psychologists help organizations thrive by choosing the right talent for 

the right job at the right time, using the most accurate, cost effective, and fair tools available (Ployhart, 
Schmitt, & Tippins, 2017). 


Traditionally, I-O psychology identifies the best person for the job by making common selection tools like 
the resumé or interview more scientific, or by developing technical screening tools such as psychometric 
assessments. Today, artificial intelligence (Al) is being promoted as a tool that more quickly and easily 
screens job candidates. In other words, Al is now being used to supplement or replace traditional meth- 
ods that measure the individual differences that predict future job performance. 


As hiring experts, I-O psychologists have a responsibility to evaluate Al in this use case. Although Al 
makes several hopeful promises for more efficient, effective, and enjoyable talent selection, many topics 
relevant to traditional testing also deserve future research attention for Al tools: reliability, validity, fair- 
ness transparency, acceptance by job seekers, and legality. I-O psychologists have a document called the 
Principles for the Validation and Use of Personnel Selection Procedures (5th ed.; SIOP, 2018) that covers 
these key issues that remain highly relevant to Al tools. 


All of that to say, Al is a megatrend that is already changing how organizations function, especially when 
it comes to hiring talent. We want to help you adapt with the changing technology in this critical area. 
To paraphrase Thomas Siebel, founder of an Al company and author of Digital Transformation, Al is an 
oncoming train and you’re either on it or you’re on the track (LeVine, 2019). 


Therefore, the goals of this paper are to dispel some of the mystique that surrounds Al used for hiring, 
while also encouraging confident investment in Al tools that can help your organization succeed through 
more effective talent assessment and selection. 


Background 
What Is Al? 


When the term “Al” is invoked it often comes with a lot of hype, jumbled meanings, and unrealistic expecta- 
tions. So, it is helpful to start with a definition: Al is the collection of technologies and algorithms that imitate 
one or more of the following human abilities (Barney, 2019): 


e natural language processing: understanding and communicating in a human language; 
e knowledge representation: creating models based on what a system learns or perceives; 
e reasoning: using data to answer questions or draw new conclusions; 
e learning: adapting to new situations and extrapolating patterns; 
2 


SI@P SIOP White Paper Series 


e sensing: perceiving objects or people; 
e object manipulation (in some cases): moving objects in a physical space. 


This definition is important because we must remember that Al is not “one thing.” Tools rightly billed as “Al” 
normally have several machine or deep learning models behind the scenes working together to process inputs 
that are supervised or unsupervised (i.e., data that are labeled by human labelers, or not) to create outputs 
that are continuous or categorical (i.e., predicting a number or a category/group). In other words, much of 
what Al is actually “doing” can be broken down by these functions: 


e classification: assigning things to a group based on their similarity to previously labeled groups; 

e clustering: determining potential groups from unlabeled data; 

e regression: predicting a number based on a known relationship; 

e identifying patterns between variables: experimenting with potential relationships within data to discover 
patterns. 


Supervised Unsupervised 


Classification Clustering 


Regression Identifying patterns 


Continuous Categorical 


Figure adapted from Soni (2018) 


Al might seem like brand-new technology, but much of the math and computer science behind Al and machine 
learning (Al’s equally famous subcategory) have been around for several decades. What has changed recently 
is that computing costs are down, data volume and velocity are up, and the number of computations possible 
on a single chip have never been higher. 
Despite recent progress and attention, a human-like Al (also called a “general” or “strong” Al) is still science fic- 
tion. However, “narrow” or “weak” Als (those trained to complete specific tasks: recognize images, play games, 
and make recommendations) have recently made eye-popping headlines in the Wall Street Journal, New York 
Times, and Washington Post. These use-case driven Als have made notable accomplishments in fields such as 
medicine, transportation, manufacturing, law, and finance. Yet, according to Garry Mathieson, co-chair of the 
Al industry group at Littler law firm, “HR is actually late to the game” (Brin, 2019). HR might be a new use case 
for Al, but in a 2019 survey of 7,300 professionals representing nine industries, 40% said their organizations are 
already using Al to screen or assess job candidates during recruitment (Bravery et al., 2019). 
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What Are the Promises of Al for Assessing and Selecting Talent? 


Al makes bold promises to overcome many problems inherent to finding, engaging, and assessing job seekers. 


To summarize, when applied to hiring, Al claims to: 


1. Automate repetitive and cumbersome recruiting 
tasks. An early success story has been how Al 
automates manual recruiter tasks such as sourcing 
candidates and screening resumés. This is where 
most of the Al-based hiring market focuses today. 
Several vendors have sprung up to offer this ser- 
vice, touting Al-based technologies that source po- 
tential job candidates by screening and assembling 
social media and other online data. Although this 
technology blurs the line between identifying and 
assessing job candidates, initial scientific research 
shows that Al systems not only find candidates but 
also reliably assess individual differences, such as 
personality, to match candidates to jobs (Akhtar et 
al., 2019; Morelli & Illingworth, 2019). 

2. Make it possible to work with data that are mas- 
sive, decentralized, “noisy,” and otherwise un- 
structured. Al can more easily and efficiently sift 
through massive datasets that are often housed in 
systems spread across an organization (EY, 2018). 
Al is better at both processing large amounts of 
data and helping reduce the time it takes humans 
to clean and prepare data for additional analy- 
ses. Because 60% of data science is cleaning and 
organizing data (Press, 2016), Al systems that can 
clean “noisy” and unstructured data can increase 
HR analyst’s effectiveness and productivity. 
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Model more variables (features) than current 
self-report assessment methods. Traditional 
assessment approaches often “leave data on the 
table.” In other words, models based on tradition- 
al self-report assessment data typically include 
about a dozen variables. Failing to collect and in- 
clude other valuable information from candidates 
might decrease the model’s predictive power and 
candidate experience. Conversely, deep learning 
is an Al approach that can handle zettabytes of 
data and thousands of features in a model, poten- 
tially increasing a model’s predictive power while 
reducing candidate time and effort. Although deep 
learning’s potential is most often applied to image 
and speech recognition, assessments that use vid- 
eo and speech data can now incorporate hundreds 
or thousands of features into its algorithms with 
the help of deep learning, in an attempt to en- 
hance traditional selection methods. 

Help I-O psychologists create and validate tradi- 
tional assessments. If an “Al-first” solution isn’t 
appealing, Barney (2019) describes how the use 
of Al can make creating or validating traditional 
assessments easier and cheaper. For instance, |-O 
psychologists can use Al to conduct traditional job 
analyses efficiently and thoroughly. Job analyses 
are the bedrock data-collection activities that 
define what tasks are performed on a job and 
what key characteristics (KSAOs) are required to 
perform those tasks. Al can also supplement how 
l-O psychologists perform targeted meta-analy- 
ses (quantitative summaries of studies) that help 
streamline how assessments get validated. Al can 
even perform analyses on its own to determine 
an assessment’s psychometrics—that is, Al can 
compute the statistics that suggest an assessment 
is reliable and accurate. 
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Ask yourself: what things would be better if they were done 24/7? 
What would be better if it were done at scale? What would benefit 
from greater consistency? What would be possible if we leveraged 
broader expertise to see beyond our current limits? 
These are good candidates for Al. 


—Deborah Bubb 
VP and Chief Leadership, Learning & Inclusion Officer 
IBM (Guenole & Feinzig, 2018) 


Implications for Practice 
What Is a Specific, Viable Example of Al Applied to Talent Assessment and Selection? 


Natural Language Processing (NLP), or Al-based interpretation and application of human language, is a large- 
scale attempt to identify and select talent effectively from text-based data. NLP is much more than keyword 
matching; it can identify themes and relationships within and across text passages. One Al company has al- 
ready used NLP to process millions of resumés and CVs to cluster people based on their similarity to a job pro- 
file defined by a referent individual or job posting. Matches can be searched by adding and subtracting profiles, 
greatly increasing the search’s flexibility beyond keyword matching to a profile “analogy” —or the characteristic 
and experience combinations recruiters and leaders often use to describe an ideal candidate (May, 2016). 


Scientific studies have shown how NLP and machine learning can create structured data from unstructured resumes 
(this is often a very cumbersome and error-prone data-entry task for recruiters). Structured resumé data help to 

rank candidates according to a desired job posting (Sadiq, Ayub, Narsayya, Ayyas, & Tahir, 2016). A growing body of 
research is also showing that personality can be assessed by mining social media and online profiles such as Linke- 
din, and some studies have suggested that social media-assessed personality traits are related to work performance 
(Akhtar Winsborough, Lovric, & Chamorro-Premuzic, 2019). These researchers argue that this is an exciting prospect 
for talent assessment because online data reflect offline behavior, represent a more complete picture of someone’s 
personality than data from self-report measures, and can systematically be tied to job performance. Certainly, privacy 
issues are not to be ignored here, but the promise of NLP lies in analyzing enormous amounts of data in a standard- 
ized way that removes the subjectivity inherent to casually vetting social media profiles, interviews, and personal CVs. 


NLP has also been used to score open-ended assessment responses (Campion, Campion, Campion, & Reider, 
2016). The scientific and vendor-based research shows that text data (either taken from written essays or 
transcribed from video or audio) can be mined and understood for its content, increasing both an assessment’s 
efficiency and the number of features that can be added to a model. Combined with easier scoring, NLP of 
open-ended responses make it possible to create more engaging and realistic assessment simulations. These 
simulations can be dynamic to candidate responses, allowing scenarios to look and feel like the job while pro- 
viding a personalized experience to the candidate. 


Finally, NLP is behind conversational chatbots that help newly hired employees get up to speed faster. For 
example, each year Unilever hires 30,000 people out of nearly 2 million applications. In addition to assess- 
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ing aptitudes and matching candidates to jobs using machine learning, Unilever onboards new hires with a 
NLP-powered chatbot that answers simple questions and retrieves information through dialogue (Marr, 2018). 
Although still in its early phases, over 80% of the regions who have implemented the chatbot said they would 
continue to use it in their onboarding process. 


What Are the Open Questions and Obstacles Related to Al-Based Hiring Tools? 


As with any new tool or technology, organizations need to understand the logistical and ethical implications 
before blindly adopting Al for hiring. Fortunately, many computer scientists and I-O psychologists have already 
developed important questions for organizational decision-makers to consider. 


For example, many I-O psychologists evaluating Al hiring tools, especially tools that incorporate deep learning 
models, have asked a conceptual question: Are the predictors or variables inside Al algorithms job related? 

In other words, do we really understand the variables that are being combined in a deep learning model? 
These questions are often mentioned as the “black box” problem, or the inability to interpret a multilayered, 
data-driven Al model. Models that cannot be interpreted cannot be as easily defended in court or explained 
to business stakeholders and job candidates; two very important issues related to assessment and selection. 
However, all Al solutions shouldn’t be dismissed as “black boxes.” In some cases, interpretability might not be 
the chief goal and dismissing or discounting Al as being a “black box” ignores recent efforts to increase a mod- 
el’s interpretability (Landers, 2019). For instance, a company whose stated mission is to “make Al explainable” 
just raised $30 million from venture capitalists to grow the business—a tangible sign of work in this area. 


Ethical questions often asked are: Are Al hiring tools biased? If so, are Al hiring tools legally defensible? Re- 
search has demonstrated that models trained on text, such as those used in NLP-based tools, contain the se- 
mantic or historical biases present in the text itself (Caliskan, Bryson, & Narayanan, 2017). Although there are 
no court cases from the EEOC (yet), many in the field have warned about the inherent biases that can creep 
in from the data that are used to train models. For example, Amazon suffered some negative PR from an Al 
resumé screening tool that was biased against women (Dastin, 2018). Bias will be an important, ongoing issue 
for developers and practitioners to consider, but some researchers are already charting a course to correct or 
mitigate inherent bias in Al models (Veale & Binns, 2017). 


Finally, if the goal of using Al tools is to hire the right candidate at the right time, how do candidates react to 
algorithmic hiring? A recent Pew Research study suggests candidates are worried (Smith & Anderson, 2017). 
In this study, over 4,000 respondents reacted to a hypothetical scenario where “computer programs may be 
able to provide a systematic review of each [job] applicant without the need for human involvement.” More 
than half (67%) said that this scenario worries them, and three out of four said that they would not apply to a 
job that used an algorithm to make a hiring decision. Attitudes and perceptions can change quickly, but these 
numbers reveal people’s uneasiness with humans being completely removed from the decisions that affect 
their livelihoods. How to overcome the fear and uncertainty of Al hiring tools is a difficult and open question 
for practitioners and developers to answer. 


Next Steps 
What Should a Savvy Consumer Do Before Using an Al Hiring Tool? 
1. Get l-O psychologists involved. It’s true that new developments in Al-based hiring often come from com- 


puter scientists and software engineers, but you don’t need to solely rely on engineers to solve talent 
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selection challenges. Hire your next data scientist with I-O psychology training and expertise in hiring. 
Reach out to an I-O psychology consultant to help vet a potential Al vendor, or choose to work with com- 
panies that employ I-O psychologists. Having an I-O psychology perspective at the table can help you ask 
the right questions to make sure your chosen Al hiring tool is relevant, effective, ethical, and legal. This 
advice echoes calls from IT leaders for humans to be “in the loop” (i.e., have cyclical human input during a 
model’s development and tuning) so that negative or biased Al-based decisions can be avoided (Persson & 
Kavathatzopoulos, 2017; Rahwan, 2017). 

2. Pair Al-based tools with human decision makers. Al has a lot to offer in terms of efficiency and accuracy, 
but we can’t expect people to be completely removed from a hiring decision. Rather than remove man- 
agers or other stakeholders from an Al-infused hiring process, consider how you might involve them. This 
“human plus machine” approach is supported by research that’s shown how combining algorithms and hu- 
man judgment increases the accuracy of predicting future job performance beyond human judgment alone 
(Kuncel, Klieger, Connelly, & Ones, 2013). 

3. Apply a healthy amount of skepticism to marketing materials and ask for specifics. Even if you don’t un- 
derstand all of the answers, asking for specifics about “how the sausage is made” helps detect vague or 
contradictory answers from vendors, and it allows you to compare across vendors. Thinking critically about 
technology that is often marketed as a magic wand will help you ultimately work with vendors that can 
deliver on realistic promises and add value. 


Al applied to assessing and selecting talent offers some exciting promises for making hiring decisions less costly 
and more accurate for organizations while also being less burdensome and (potentially) fairer for job seekers. 
However, HR is “late to the Al game.” In these early days it’s important to understand that there is plenty of hype 
surrounding Al-based tools. After being equipped with the right questions and knowledge that removes some 
of Al’s mystique, you can be a more informed Al user and a better decision maker for your organization. 


Questions to ask a vendor when evaluating an Al hiring tool: 

1. How do you model the task-based and team-based requirements of the job? 

2. How do you define the human skills and traits that are relevant for selecting job 
applicants and predict performance? 

3. How have you validated the tool in a way that complies with legal and professional 
standards? 

4. What specific Al methods are employed in your product (i.e., regression or classifi- 
cation)? 

5. How are data gathered and prepared when developing the model(s)? 

6. What empirical evidence can you share that supports the reliability, validity, and 
fairness of your Al tools (e.g., test-retest reliability, validity coefficients, and group- 
mean differences)? 

7. Have you compared your results with traditional alternatives or other Al tools? 
How confident can | be that your results will apply in my organizational setting? 

8. At what point are humans involved in the final deliverable or outcome? 
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